Search CORE

389 research outputs found

Reproducible Research: A Bioinformatics Case Study

Author: Gentleman Robert
Publication venue: Collection of Biostatistics Research Archive
Publication date: 20/05/2004
Field of study

While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has remained relatively stagnant. Publication is largely done in the same manner today as it was fifty years ago. Many journals have adopted electronic formats, however, their orientation and style is little different from a printed document. The documents tend to be static and take little advantage of computational resources that might be available. Recent work, Gentleman and Temple Lang (2004), suggests a methodology and basic infrastructure that can be used to publish documents in a substantially different way. Their approach is suitable for the publication of papers whose message relies on computation. Stated quite simply, Gentleman and Temple Lang propose a paradigm where documents are mixtures of code and text. Such documents may be self-contained or they may be a component of a compendium which provides the infrastructure needed to provide access to data and supporting software. These documents, or compendiums, can be processed in a number of different ways. One transformation will be to replace the code with its output -- thereby providing the familiar, but limited, static document. In this paper we apply these concepts to a seminal paper in bioinformatics, namely The Molecular Classification of Cancer, Golub et al. (1999). The authors of that paper have generously provided data and other information that have allowed us to largely reproduce their results. Rather than reproduce this paper exactly we demonstrate that such a reproduction is possible and instead concentrate on demonstrating the usefulness of the compendium concept itself

Collection Of Biostatistics Research Archive

Bioinformatics Software Engineering

Author: Gentleman Robert
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/03/2005
Field of study

Abstracts not available for BookReview

Directory of Open Access Journals

Journal of Statistical Software

Developing Statistical Software in FORTRAN 95

Author: Robert Gentleman
Publication venue
Publication date
Field of study

Research Papers in Economics

Extensions to Gene Set Enrichment

Author: Gentleman Robert
Jiang Zhen
Publication venue: Collection of Biostatistics Research Archive
Publication date: 02/08/2006
Field of study

Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper. Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting gene sets using different statistics. We find that gender may have effects on the gene expression in addition to the phenotype effects. Investigating overlap among interesting gene sets indicate that overlapping could alter the interpretation of the significant results

Collection Of Biostatistics Research Archive

Making the most of high-throughput protein-interaction data

Author: Gentleman Robert
Huber Wolfgang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Better methods of statistical analysis could make large-scale protein-interaction data more useful

Crossref

PubMed Central

Assessing The Role Of Multi-protein Complexes In Determining Phenotype

Author: Gentleman Robert
Le Meur Nolwenn
Publication venue: Collection of Biostatistics Research Archive
Publication date: 16/01/2008
Field of study

Understanding regulatory mechanisms in complex biological systems is an important challenge, in particular to understand disease mechanisms, and to discover new therapies and drugs. In this paper, we consider the important question of cellular regulation of phenotype. Using single gene deletion data, we address the problem of linking a phenotype to underlying functional roles in the organism and provide a sound computational and statistical paradigm that can be extended to address more complex experimental settings such as multiple deletions. We apply the proposed approaches to publicly available data sets to demonstrate strong evidence for the involvement of multi-protein complexes in the phenotypes studied

Collection Of Biostatistics Research Archive

Statistical Analyses and Reproducible Research

Author: Gentleman Robert
Temple Lang Duncan
Publication venue: Collection of Biostatistics Research Archive
Publication date: 29/05/2004
Field of study

For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document\u27s contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents, including figures, tables, etc., can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or ``source\u27\u27 document from which one can generate different views in the form of traditional, derived documents for different audiences. We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection. The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution and use of such reproducible research are discussed

Collection Of Biostatistics Research Archive

Modeling synthetic lethality

Author: Gentleman Robert
Le Meur Nolwenn
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Using new computational tools in yeast, multi-protein complexes were identified that share an unusually high number of synthetic genetic interactions

HAL-CentraleSupelec

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

HAL-Rennes 1

Visualizing Genomic Data

Author: Gentleman Robert
Hahne Florian
Huber Wolfgang
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/02/2006
Field of study

The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data

Collection Of Biostatistics Research Archive